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Abstract 

Endometrial cancer (EC) contributes substantially to total burden of cancer morbidity and mortality in the United States. 
Family history is a known risk factor for EC, thus genetic factors may play a role in EC pathogenesis. Three previous genome- 
wide association studies (GWAS) have found only one locus associated with EC, suggesting that common variants with large 
effects may not contribute greatly to EC risk. Alternatively, we hypothesize that rare variants may contribute to EC risk. We 
conducted an exome-wide association study (EXWAS) of EC using the Infinium HumanExome BeadChip in order to identify 
rare variants associated with EC risk. We successfully genotyped 177,139 variants in a multiethnic population of 1,055 cases 
and 1,778 controls from four studies that were part of the Epidemiology of Endometrial Cancer Consortium (E2C2). No 
variants reached global significance in the study, suggesting that more power is needed to detect modest associations 
between rare genetic variants and risk of EC. 
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Introduction 

Endometrial cancer (EC), a cancer of the uterine epithelial 
lining that typically occurs near or after menopause, is the most 
common cancer of the female reproductive organs and the 1 0th 



leading cause of cancer death in women in the developed world 
[1-3]. EC is strongly associated with estrogen-only post-meno- 
pausal hormone therapy [4,5] and excess body weight [6] due to 
increased aromatization of C-19 steroids by excess adipose tissue 
[7]. These risk factors support the "unopposed estrogen" 
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hypothesis in which EC may develop because of the unchecked 
mitogenic effects of estrogen in the absence of sufficient 
progesterone [8]. Some studies have shown that family history 
increases risk two to three-fold in younger women who have a first- 
degree female relative with EC [9,10], while among older women 
the association is less strong. In addition, there is an increased risk 
of EC in women with Lynch syndrome [11], a hereditary 
autosomal dominant condition that confers a high risk of 
colorectal cancer as well. These observations suggest that germliiie 
genetics may contribute to EC susceptibility. 

Genome-wide association studies (GWAS) have successfully 
identified more than a hvmdred susceptibility loci for a variety of 
cancer types [12]. Three GWAS studies of EC have been 
conducted to date with only one identifying a novel genome-wide 
significant locus, rs4430796, (p = 7.1 x 10~'") associated with EC 
[13] at the HMFIB gene region on chromosome 17ql2. Two 
independent studies subsequently replicated the association with 
rs4450796 [14,15]. However, two other GWAS studies of EC 
[14,16] were not able to identify additional genome-wide 
significant loci, suggesting that common variants with large effects 
may not highly contribute to the familial risk of EC. 

Most risk alleles discovered through GWAS have modest effect 
sizes that do not account for much heritabUity of common diseases 
[17]. Moreover, GWAS studies have focused on common variants 
(>5%) in the general population. Low frequency variants make up 
a large fraction of genetic variation in humans and may explain a 
substantial portion of the heritability in cancer etiology. Recent 
exome-sequencing studies have found rare variants in candidate 
susceptibility genes for familial colorectal cancer [18], breast 
cancer [19], and prostate cancer [20], suggesting that analysis of 
rare variants may also provide insight into the etiology of EC. 
However, exome-sequencing studies require samples sizes that are 
not amenable to large epidemiological studies due to the high cost 
currently needed to achieve sufficient statistical power. 

There has been a push to develop statistically powerful, yet 
relatively inexpensive, methods to detect associations for rare 
variants with larger effect sizes. lUumina has recently developed 
the Infinium HumanExome BeadChip (exome array) from non- 
synonymous variants found at least 3 times on more than 2 data 
sets from the whole-exome sequencing of more than 12,000 
individuals. This array provides a platform from which we can 
begin to survey the landscape of rare variation in a large number 
of samples. 

We genotyped rare variants in a multiethnic population of 3,067 
women (1,169 EC cases and 1,898 controls) from the Epidemi- 
ology of Endometrial Cancer Consortium (E2C2) [21] in order to 
test the hypothesis that rare variants in coding regions may be 
associated with EC risk. 

Methods 

Ethics committee from each participating study (Alberta Health 
Services; Estrogen, Diet, Genetics and Endometrial Cancer Study; 
Multiethnic Cohort Study) obtained written informed consent 
from all study participants. All written consent was approved from 
the Institutional Review Board (IRB) from each institution 
(Alberta Health Services, Canada; Memorial Sloan Kettering, 
USA; University of Hawaii Cancer Center, USA; Keck School of 
Medicine-University of Southern California, USA). 

Alberta Health Services, Memorial Sloan Kettering, University 
of Hawaii Cancer Center, and University of Southern California 
institutional review boards specifically approved the present study 
(Exome-Wide Association Study of Endometrial Cancer), as well 
as the written consent obtained from participants. 
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Figure 1. Minor allele frequency for ail variants successfully 
genotyped over aii etKinicities. The number of variants is plotted by 
the minor allele frequency over all ethnicities. These variants include 
those that are monomorphic in all ethnicities. 
doi:10.1371/journal.pone.0097045.g001 

Participating studies also obtained IRB certification, permitting 
data sharing according to tlie NIH Policy for Sharing of Data 
Obtained in NIH Supported or Conducted Genome- Wide 
Association studies (GWAS). 

Study Population 

Exome array genotyping was performed on 3,067 samples from 
3 retrospective case-control studies: the Alberta Health Services 
Study (AHS) [22], the Estrogen, Diet, Genetics and Endometrial 
Cancer study (EDGE) [23], and the Fred Hutchinson Cancer 
Research Center (FHCRC) study and 1 case-control study nested 
within the prospective Multiethnic Cohort Study (MEC) [24]. 
Studies participating in this analysis are described in Table 1 and 
in our previous GWAS [14]. Of the women included in the study, 
1,169 were EC cases and 1,898 were controls. Cases were 
restricted to those diagnosed with the most common subtype of EC 
(type I) while controls were cancer free and had an intact uterus. 
Controls were matched to cases by age and study site. 
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Genotyping and Quality Control 

DNA was extracted at each study site from buffy coat or cheek- 
cell samples foUowing the manufacturer's protocol and genotyped 
at the University of Southern California using the Infinium 
Human Exome BeadChip (lUumina Inc., San Diego, CA) as part 
of the Stage II replication of the E2C2 GWAS. The BeadChip 
included 9,232 custom markers, 2,211 of which are specifically 
relevant to EC, in addition to the 247,870 markers coding 
primarily for protein-altering variants already included in the 
BeadChip's default design. 

Genotype calling was performed with lUumina GenCaU on all 
samples (n = 3,067) using the MEC cluster file (16,000 
multiethnic samples) for the non-custom markers and autocluster- 
ing for the custom markers. Variants were excluded from analyses 
if call rates were < 90% (n = 1 15), the variant was monomorphic 
(n = 77,521), the loci had no observed founders and missing all 
genotypes (n = 1,962), the variant was an insertion or deletion 
allele (n = 1 1 7), or the variant deviated from Hardy- Weinberg 
equilibrium at p-value < 0.0001 in any ethnic group (n = 248). 
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Figure 2. Minor allele frequency for all variants successfully genotyped by reported ethnicity. The number of variants is plotted by the 
minor allele frequency for each ethnicity. All these variants are polymorphic in at least one reported ethnicity. 
doi:1 0.1 371 /journal.pone.0097045.g002 



The final disease trait analysis data set contained 177,139 
successfully genotyped variants. 

In total, 3,03 1 out of 3,067 samples were successfully genotyped 
with call rates & 90%. Of these, we removed 40 duplicate samples 
(genotype concordance rate > 99.9%) used for assay quality 
control and 15 samples for other quality control reasons. We 
conducted principal components analysis (PCA) to identify self- 
reported ethnicity oudiers and infer ancestry with EIGENSOFT 
V 4.2 [25] using 47,097 custom and non-custom SNPs with 
genotyping rates > 90% and MAF > 1 %. The HapMap phase II 
(buUd 37) CEU, YRI, and JPT-CHB samples were used as 
reference populations. Using the first 5 principal components, we 
determined 7 individuals that were ethnicity outiiers and excluded 
them from analyses. After further removal of 136 outliers (more 
than 3.5 standard deviations from the mean) of sample heterozy- 
gosity by ethnicity, 2,833 women (1,055 EC cases and 1,778 
controls) remained for disease trait analysis. 

Statistical Analysis 

Single variant association analysis. Single variant analyses 
were performed overall and stratified by self-reported ethnic 
group. For each SNP, we estimated odds ratios (OR) and 95% 
confidence intervals (CI) using unconditional logistic regression, 
assuming an additive genetic model (0, 1, 2 copies of the minor 
allele) and adjusting for body mass index (BMI in kg/m^), age, 
study site, plate, and the first 4 principal components to account 
for population stratification. All single variant analyses were 
performed using PLINK v 1.07 [26]. 



Gene-based analysis. As an additional method to discover 
rare variants associated with EC, gene-based testing was 
performed using SKAT-O [27] over all ethnicities. SKAT-O 
combines gene-burden tests and SKAT, a SNPset level test for 
association using kernel machine methods, in special cases for an 
optimized approach that maximizes power. These analyses were 
also adjusted for BMI, age, study site, plate and the first 4 principal 
components. In total, 16,245 genes with at least one variant were 
tested. 

Statistical significance. We determined single variant asso- 
ciation to reach global significance if the unadjusted p-value was 
<2.82 X 10~ , corresponding to a Bonferroni correction for 
177,139 tests. Gene-based associations were considered significant 
for unadjusted p-values <3.08 x 10 ^, corresponding to a 
Bonferroni correction for 16,245 tests. 

In accordance to NIH/NCI policy all data will be submitted to 
the database of Genotypes and Phenotypes (dbGaP, http://www. 
ncbi.nlm.nih.gov/ gap). 

Results 

Association analyses included 177,139 successfully genotyped 
variants with MAF > 0 from a total of 257,102 variants included 
in the array. Population characteristics of the four participating 
studies (AHS, EDGE, FHCRC, and MEG) are described in 
Table 1 . Mean age at diagnosis for cases ranged from 58.5 years in 
AHS to 65.5 years in MEG and mean BMI at diagnosis for cases 
ranged from 28.8 kg/m^ in MEG to 32.3 kg/m^ in AHS and 
EDGE. Of the 3,067 samples genotyped, 2,833 were included in 
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Figure 3. Six-way Venn diagram showing polymorphic putative functional variants shared by reported ethnicities. Numbers of shared 
variants are shown at intersections. The total numbers of polymorphic variants by ethnicity are listed in the upper-left hand corner. 
doi:1 0.1 371 /journal.pone.0097045.g003 



the analysis. There were no differences in age, BMI, and ethnicity 
between excluded cases and those included in the analysis (results 
not shown). Of these 2,833 individuals, there were 254 self- 
reported African-Americans, 347 self-reported Asians, 1,686 self- 
reported Caucasians, 79 self-reported Hawaiians, 360 self-reported 
Latinas, and 107 who did not report a specific ethnicity (Table 2). 

Variant Distribution among Reported Ethnicities 

In this study population, 77,521 variants (30.4%) were found to 
be monomorphic across all reported ethnicities and 177,139 
variants (69.6%) were polymorphic in at least one ethnic 
population with 74.0% of polymorphic alleles having MAF 
S 1% (Figure 1). Of the variants that were polymorphic in at 
least one ethnic population, 42.0% in African Americans, 71.7% 
in Asians, 34.9% in Caucasians, 69.7% in Hawaiians, 49.5% in 
Latinas, and 60.0% in those of unknown ethnicity were 
monomorphic (Figure 2). The MAF distributions were fairly 
similar among Asians, Hawaiians, and those who did not report a 
specific ethnicity while African Americans, Caucasians, and 
Latinas shared more similarities in MAF with each other than 
with Asians, Hawaiians, and those of unknown ethnicity. About 
20.2% (n = 35,912) of variants were shared by all 5 reported 
ethnicities while Caucasians and Latinas had the most variants in 
common at 41.1% (n = 72,878) (Figure 3). Caucasians had the 
most unique polymorphic variants (18.7%), followed by African- 
Americans (14.0%), Latinas (3.2%), Asians (2.7%), those who did 
not report ethnicity (1.0%), and Hawaiians (0.4%). 



Single Variant Association for Endometrial Cancer 

No variants reached global significance in single variant 
association of EC for all ethnicities combined (Figure 4a, 
Table 3) when correcting for multiple comparisons using 
the Bonferroni adjustment (p <2.82 x 10~ ). The strongest 
associations were for variants with >0.05 MAF (Table 3) located 
within 50 kb of the long non-protein coding intergenic RNA, 
ZZYCf0520 (rsl953358, OR = 1.36, p = 4.76 x 10"') and in the 
intron region oi PROSl (rs8178648, OR = 1.71, p = 1.53 x 
lO"''), which codes for protein S, a cofactor to protein C in the 
anti-coagulation pathway. In Caucasians, who make up the 
majority of the overall analysis, only rs8 178648 remained 
suggestively associated with OR = 1.98 and p = 3.35 x lO"'' 
(Figure 4b, Table 3). There were no globally significant or 
suggestive variants in African Americans, Asians, Hawaiians, 
Latinas, and those who did not report ethnicity (Table SI). 

Gene-based Analysis of Endometrial Cancer 

None of the gene-based tests of association were globally 
significant (p < 3.08 x 10~^) after adjusting for multiple 
comparisons (Table S2). Of the 16,245 genes tested, the most 
significant EC association was with KRT81 (p = 2.21 x 10~*), a 
member of the keratin gene family located on 12ql3. PROSl, 
where rs8 178648 is located, was not significantly associated with 
EC (p = 0.6789) when testing over all ethnicities neither when 
testing only in Causasians (results not shown). 
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Figure 4. Manhattan plots for the endometrial cancer association analysis. Results of single variant analyses (-log^op) are plotted against 
chromosome position (NCBI build 37) for association over all ethnicities (A) and for associations within Caucasians (B). Suggestive variants are labeled 
above. Results were adjusted for age at diagnosis, BMI, study site, plate, and the first four principal components. 
dol:1 0.1 371 /journal.pone.0097045.g004 
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Discussion 

We present an initial exploration into whether rare variants are 
associated with EC risk in a multiethnic population from the 
E2C2. No variants reached global significance (p < 2.82 x 10~') 
in the single variant association analyses of EC in all ethnit:ities 
combined or when stratified by reported ethnicity. Additionally, 
no gene-based test of association reached global significance 
(p < 3.08 X 10"*^). 

Among all ethnicities, rsB 1 78648 on chromosome 3 maintained 
a suggestive association with EC (OR = 1.707, 95% CI: 1.363- 
2.123, p = 1.53 X 10 ^). The variant lies within the intron region 
of PROSl, a gene coding for protein S, a cofactor in the 
anticoagulant pathway that causes autosomal dominant hereditary 
thrombophilia when mutated [28]. PROSl expression has been 
reported to be elevated in aggressive prostate cancer tissue [29] 
and thyroid cancer tissue [30], suggesting it may have a role in 
cancer etiolog\' or progression. PROSl has been found to be 
directly upregulated by progestins [31] and downregulated by 
17|3-Estradiol, an estrogen that regulates gene expression via the 
estrogen receptor [32], making it susceptible to imbalances in the 
sex hormone metabolic pathway, which is implicated in EC 
etiology. However, PROSl was not significandy associated with 
EC (p = 0.6789) when using SKAT-O and no other GWAS have 
found significant or suggestive variants in this gene. 

One weakness of this study is our limited sample size, which was 
not sufficiently powered to detect rare variants with modest effects 
associated with EC. Additionally, the exome array content is 
predominantly based on European ancestry whereas our study 
included a substantial number of samples with other ancestries. 
Incomplete exome array coverage of all functional variants and 
indels that may impact EC risk may also have limited the scope of 
our study. However, our analysis is one of only two studies [33] 
using the exome array to examine associations between rare 
variants and complex diseases in large multiethnic populations. 
Our study is also the first to utilize the exome array with EC and 
serves as an extension to our previous examination of common 
variants on EC risk. 

A previous GWAS [13] identified one novel locus near HNFIB, 
rs4430796, inversely associated with EC risk. We replicated the 
findings in our GWAS [14], but no other common variants 
associated with EC have been determined. Exome arrays that 
focus on rare variants, which are hypothesized to have larger effect 
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